Gender bias in language models in long-term care

Sam Rickman
Supervisors: Jose-Luis Fernandez, Juliette Malley

September 2024

Large language models (LLMs) in long-term care

Writing notes:

  • Conversation is recorded using social worker’s phone.
  • This transcript is summarised into case notes.

Information retrieval:

  • LLM generates summaries of case notes.

How widespread is this?

 

  1. June 2024 survey: 4 English LAs report using LLMs in LTC.1
  2. 43% of councils saw the AI benefits in LTC.2
  3. Sep 2024: 6 English LAs websites state they use LLMs for LTC (pop. 1.6m).

Research question

  1. Is there gender bias in state-of-the-art LLMs used in long-term care?
    • Inclusion bias 3
    • Linguistic bias 4

The data

  1. Data from a London local authority.
  2. All adults who were:
    • Aged 65 years and over by the 31st August 2020
    • Receiving care services in the community for at least a year since the end of 2015.
  3. 3,046 individuals (62% women).

 

3046 individuals
Needs assessments
Services received
Free-text case notes

Quantity of free text data

Summarisation models

  • Large language models:
    • Gemma (Google, 2024): 8bn parameters
    • Llama 3 (Meta, 2024): 7bn parameters

Summarisation models

  • Large language models:
    • Llama 3 (Meta, 2024): 7bn parameters
    • Gemma (Google, 2024): 8bn parameters
  • Benchmark models:
    • T5 (Google, 2019): 220m parameters
    • BART (Meta, 2019): 406m parameters

Strategy

Use LLMs to create summaries of case notes and measure:

  1. Sentiment analysis.
  2. Inclusion bias: count of words related to themes: physical health, mental health, physical appearance, subjective language.
    • e.g. physical health list: amputate, arthritis, asthma etc.
  3. Linguistic bias: count of all words used for men and women.

Where does bias come from?

Use LLM to change gender

Mrs Smith is a 87 year old, white British woman with reduced mobility. She lives in a one-bedroom flat. She requires support with washing and dressing. She has three care calls a day.

Mr Smith is a 87 year old, white British man with reduced mobility. He lives in a one-bedroom flat. He requires support with washing and dressing. He has three care calls a day.

Caveat: not all notes translate

  • Domestic violence
  • Prostate cancer
  • Mastectomy

Removed notes with sex-specific body parts or domestic abuse.

Metrics

  1. Sentiment analysis
    • SiEBERT - a general purpose, pre-trained, binary sentiment analysis model.
    • Regard - a pre-trained metric was designed for the purpose of evaluating gender bias across texts.

\[\begin{align*} \text{sentiment}_{ij} &= \beta_0 + \beta_1 \cdot \text{model}_i + \beta_2 \cdot \text{gender}_j \\ &\quad + \beta_3 \cdot (\text{model}_i \times \text{gender}_j) + \beta_4 \cdot \text{max_tokens}_i \\ &\quad + u_{0j} + u_{1j} \cdot \text{model}_i + \epsilon_{ij} \label{eq:summarieslmm} \end{align*}\]

  1. Counts of words, themes
    • \(\chi^2\) test
    • Poisson regression

\[\begin{align*} \text{count}_{i} &= \beta_0 + \beta_1 \cdot \text{gender}_i + \beta_2 \cdot \text{max_tokens}_i \\ &\quad + \beta_3 \cdot \text{doc_id}_i + \epsilon_i \label{eq:worddtregression} \end{align*}\]

Results

Sentiment analysis: estimated marginal means (female - male)

Model
Regard
SiEBERT
Estimate t p Estimate t p
Benchmark models
bart -0.0036 . -2.0 0.05100 0.0094 * 2.2 0.031
t5 -0.0049 ** -2.7 0.00720 -0.01 * -2.3 0.019
State-of-the-art models
llama3 -0.0021 -1.2 0.25000 -0.0055 -1.3 0.200
gemma 0.0069 *** 3.8 0.00013 0.042 *** 9.7 0.000

Word counts

Word counts: Gemma

Word N (women) N (men) p-value (adj.)
Words used more for women
text 5042 2726 *** < 0.001
describe 3295 1764 *** < 0.001
highlight 1084 588 *** < 0.001
mention 314 136 *** < 0.001
despite 753 478 *** < 0.001
situation 819 538 *** < 0.001
Words used more for men
require 1498 1845 *** < 0.001
receive 554 734 *** < 0.001
resident 298 421 *** 0.001
able 689 848 *** 0.005
unable 276 373 *** 0.013
complex 105 167 *** 0.017
disabled 1 18 *** 0.008

Linguistic bias

Linguistic bias: Gemma

Mr. Smith has dementia and is unable to meet his needs at home.

She has dementia and requires assistance with daily living activities.

Linguistic bias: Gemma

Mr Smith is a disabled individual who lives in a sheltered accommodation.

The text describes Mrs. Smith’s current living situation and her care needs.

Linguistic bias

The man-flu effect?

Inclusion bias

Gemma: inclusion bias

Mr Smith was referred for reassessment after a serious fall and fractured bone in his neck.

The text describes Mrs Smith’s current situation and her healthcare needs.

Gemma: inclusion bias

Mr. Smith is a 78 year old man with a complex medical history.

The text describes Mrs Smith a 78-year-old lady living alone in a town house.

Llama 3

Policy implications

  • Gemma: Women’s health needs underplayed.
  • Cases are prioritised on the basis of severity.
  • Care allocated on basis of need.

Recommendations: regulatory clarity

  • LLM bias should be evaluated before use: gender, ethnicity etc.
  • This paper is reproducible - code available on GitHub.
  • Regulation:
    • UK Medical Device Regulation 2002 ❌
    • EU AI Act ❗
    • US FDA Software as a Medical Device ❗

Footnotes

  1. Local government: State of the sector: Ai research report. Technical report, Local Government Association, 2024. URL https://web.archive.org/web/20240906174435/https: //www.local.gov.uk/sites/default/files/documents/Local%20Government%20State% 20of%20the%20Sector%20AI%20Research%20Report%202024%20-%20UPDATED_3. pdf. Accessed: 2024-09-06.

  2. Local government: State of the sector: Ai research report. Technical report, Local Government Association, 2024. URL https://web.archive.org/web/20240906174435/https: //www.local.gov.uk/sites/default/files/documents/Local%20Government%20State% 20of%20the%20Sector%20AI%20Research%20Report%202024%20-%20UPDATED_3. pdf. Accessed: 2024-09-06.

  3. Steen, J. and Markert, K., 2023. Investigating gender bias in news summarization. arXiv preprint arXiv:2309.08047.

  4. Caliskan, A., Bryson, J.J. and Narayanan, A., 2017. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), pp.183-186.

document.addEventListener("DOMContentLoaded", function() { Reveal.addEventListener('slidechanged', function(event) { // Disable any transitions by setting the duration to 0 Reveal.configure({ transition: 'none', transitionSpeed: 'fast' }); }); });